Video structuring device and method

ABSTRACT

A video structuring device includes: character string extraction means for determining whether or not a character string is present in a frame image, and if it determines that a character string is present, generating character string position information for the character string present in a character string present frame image in which the character string is present, and outputting the character string position information, frame identifying information for identifying the character string present frame image, and the character string present frame image; video information storage means for storing frame identifying information, character string present frame image and character string position information in an index file all associated with one another; and structure information presentation means for associating character string display in the form of an image which is produced by cutting an area where the character string is present based on the character string present frame image and character string position information stored in the index file and displaying them on display means.

TECHNICAL FIELD

The present invention relates to archiving and monitoring of videos aswell as a method for presenting structure information on video contents,and more particularly, the present invention relates to a videostructuring device and method for efficiently accessing a certainportion in a video.

RELATED ART

Along with the recent development of digital video technologies, a largeamount of video has been accumulated in storage devices such as harddisks as moving picture files. As a moving picture file can containimages of many different time series it is generally difficult to searchfor desired video contents from a moving picture file.

As an example of a method for presenting structure information relatingto video contents, a television signal recording/playback apparatusdisclosed in JP-A-2004-080587 is known. This television signalrecording/playback apparatus includes: a recording/playback unit forwriting digital video signals which are digital television signals foreach television program or reading out written digital video signals foreach television program; a control unit for performing writing andreading processing of digital video signals; a thumbnail generation unitfor generating a thumbnail image having a reduced screen size from ascreen of at least one frame at any point within each television programout of digital video signals read out by the recording/playback unit;and a thumbnail composition unit for composing and outputting athumbnail list screen from thumbnail images for individual programsgenerated by the thumbnail generation unit. The recording/playback unithas therein a thumbnail list area which stores the thumbnail listscreen. The control unit generates a thumbnail image by means of thethumbnail generation unit each time digital video signals for oneprogram is written to the recording/playback unit, and composes athumbnail list screen from the generated thumbnail image of each programby means of the thumbnail composition unit, and stores the composedthumbnail list screen in the thumbnail list area. This television signalrecording/playback apparatus produces a thumbnail image from the firstone frame of a program, or from a screen of one or more frames at acertain point in time, such as a screen five minutes after the start ofa program, by utilizing a timer and the like.

However, since the television signal recording/playback apparatusdisclosed in JP-A-2004-080587 utilizes a plurality of frame images asthumbnails at certain time intervals or at the time of scene change, itdoes not always ensure that an index properly representing the contentsof image content is structured being associated with a video source.Consequently, the television signal recording/playback apparatus has aproblem of inefficient access to an image desired by a user because aspecific portion of a video file required by the user is not likely toappear in an index.

As a method for recognizing telop (subtitle) characters in a video,JP-A-11-167583 discloses a method, in which a video is first fed to avideo storage medium and to a telop character recognition and searchterminal. On the image storage medium side, an image storage unit storesthe video as well as ID information as at the point of accumulation ofthe video. On the telop character recognition and search terminal side,each processing for detecting a telop character display frame,extracting a telop character area, and recognizing telop characters iscarried out. An index file storage unit stores the result of the telopcharacter recognition and ID information as at the point of display ofthe telop characters as an index file. As the ID information, timeinformation may be stored, for example, and as the result of telopcharacter recognition, character codes may be outputted, for example.When the user enters character codes for his or her desired video on avideo search information input/storage unit of the video search terminalfrom an interface, e.g., a WWW (world wide web) browser, the inputcharacter codes are searched for from index files stored in the indexfile storage unit of the telop character recognition and searchterminal, and a video having corresponding ID information is retrievedfrom the video storage unit. As a result, the video thus retrieved willbe displayed on a video display unit of the video search terminal, e.g.,a computer display.

With the system based on JP-A-11-167583, however, telop characterscontained in index files are likely to include misrecognitions becausethey are text information obtained from character recognition. Due toappearance of meaningless text information resulting from suchmisrecognitions in indices, the system has a problem of low searchefficiency when the user selects a desired scene.

JP-A-2003-345809 discloses a database construction system that includes:an audio transcription device for transcribing news audio correspondingto a news video into character strings; a character recognition devicefor detecting a character appearance section in which a character stringappears in the news video and recognizing the character string; aretrieval device for determining degree of similarity among wordscontained in the result of audio transcription that corresponds to thecharacter appearance section detected by the character recognitiondevice, and retrieving a passage similar to the character stringrecognized by the character recognition device from the result of audiotranscription by utilizing the degree of similarity; and a registrationdevice for registering in a database the recognition result from thecharacter recognition device and a news video corresponding to thepassage retrieved by the retrieval device by associated with each other.This database construction system uses all the words contained in telopsrecognized by the character recognition device or in character stringsof CG captions to perform passage retrieval on transcription of newsaudio. By performing such passage retrieval, the database constructionsystem reduces the risk of extracting an irrelevant sentence beingaffected by a thesaurus for one word and the risk of registeringirrelevant news videos to the database. Since this database constructionsystem provides a search result by passage, the context of the result iseasy to understand and news video can be registered in the database in amanner that facilitates understanding of their context.

However, because character information that is not included in audio isnot registered to the database, the database construction system ofJP-A-2003-345809 has a problem of low search efficiency when the userselects a desired scene.

As an information management apparatus for managing image data,JP-A-2003-333265 discloses an information management apparatus thatincludes: an attribute extraction unit for receiving image data fromoutside and extracting attribute information of the image data from apredetermined portion of the image data; a notification destinationstorage unit for storing a notification destination to whichnotification information indicating the receipt of image data should benotified by associating it with attribute information in advance; anotification destination determination unit for extracting anotification destination from the notification destination storage unitusing the attribute information extracted by the attribute extractionunit; and an output unit for notifying the notification destinationextracted by the notification destination determination unit of thenotification information. This information management apparatus can,upon receipt of external information from outside, output informationindicative of the receipt of the external information to a notificationdestination to which the information should be provided. Here, theoutput unit extracts internal information from an internal informationstorage unit based on an internal information ID, and stores theinternal information in an view information database based on thenotification destination together with relevant information and imagedata. The output unit is also capable of transmitting notificationinformation indicating that image data has been received to a userterminal based on a notification destination received from thenotification destination determination unit, and sending an internalinformation ID received from an internal information search unit to theuser terminal together with the notification information.

However, the information management apparatus disclosed byJP-A-2003-333265 has a problem of inefficient access to a specificportion of a video desired by the user because an index properlyrepresenting the contents of image content is not structured beingassociated with a video source.

As a method for clipping characters from an image, JP-A-3-141484discloses a character clipping method that, when the number ofcharacters included in a character string is known, optically reads thecharacter string and clips out a partial screen which corresponds to onecharacter from the image of the character string. This characterclipping method extracts a one-dimensional serial feature from acharacter string image, and also defines a model function that candetermine a character clipping position which corresponds to the numberof characters and the one-dimensional serial feature. The method thennon-linearly matches the one-dimensional serial feature with the modelfunction, determines a character clipping position within the characterstring image which corresponds to the character clipping position of themodel function from a non-linear correspondence function in thenon-linear matching, and then clips out a partial image corresponding toone character from the character clipping position determined. Thischaracter clipping method can, when the number of characters included ina character string is given, clip out characters one by one from acharacter string image which has relatively large variation of characterwidth and/or spacing or in which characters are in contact with eachother, and do so with a relatively small number of parameters and in asimple way.

However, the character clipping apparatus of JP-A-3-141484 can have aproblem of inefficient access to a specific portion of a video requiredby a user because an index properly representing the contents of imagecontent is not structured being associated with a video source.

As a fast recognition and retrieval system, JP-A-2001-034709 discloses afast recognition and retrieval system that generates a feature vectorfrom an input character pattern, identifies the feature vector inaccordance with a condition stored in each node of a decision treeprepared in advance, sequentially selects a child node in accordancewith the result of the identification, and repeats this classificationuntil it reaches the terminal node. This fast recognition and retrievalsystem includes generation means for generating a template of amulti-dimensional feature vector stored in a recognition dictionary froma set of patterns to which a predetermined correct answer category hasbeen given; template dictionary storage means for storing a templategenerated by the generation means and a pattern that contributed to thegeneration of the template by associated with each other, subsetgeneration means for classifying a set of currently targeted templatesand patterns corresponding to each of the templates, and the occurrencefrequency of a correct answer category into subsets, and outputtingtemplates that belong to subsets as well as a threshold value forseparation into a subset; hierarchy dictionary means for storing subsetsof templates that are sequentially generated by the subset generationmeans by associated with a corresponding subset of templates prior toseparation; decision tree classification means for receiving a hierarchystructure stored in the hierarchy dictionary storage means from the toplevel of the hierarchy to classify input patterns, and outputting achild node which is result of the classification; and categorydetermination means for reading out feature quantities effective fordetermining a template from leaf nodes of the hierarchy structure andperforming major classification by use of the feature quantities. Thesubset generation means generates a decision tree by including acategory that exists across a defined threshold value into subsets onboth sides of the threshold value. This fast recognition and retrievalsystem can perform a fast retrieval in a stable required time withoutaccompanying backtrack by optimizing the classification method fordetermining a subsequent category in accordance with the distribution oftemplates belonging to the leaf nodes of the decision tree, andregistering a template that exists across the boundary between subsetsby including the template in both the nodes when generating a decisiontree.

However, the fast recognition and retrieval system of JP-A-2001-034709can have a problem of inefficient access to a specific portion of avideo required by the user because an index properly representing thecontents of image content is not structured being associated with avideo source.

The patent documents cited herein are listed below, all of which areJapanese patent laid-open publications.

-   Patent Document 1: JP-A-2004-80587-   Patent Document 2: JP-A-11-167583-   Patent Document 3: JP-A-2003-345809-   Patent Document 4: JP-A-2003-333265-   Patent Document 5: JP-A-3-141484-   Patent Document 6: JP-A-2001-034709

DISCLOSURE OF THE INVENTION Problem to be Solved by the Invention

After all, the related techniques outlined above have such problems asinefficient access to an image required by a user, low search efficiencyin selecting a desired scene by the user, and inefficient access to aspecific portion of a video required by the user.

An object of the present invention is to provide a video structuringdevice and method that can structure a character string display whichproperly represents the contents of image content associating it with avideo source and improve efficiency of access to a specific portion of avideo required by the user.

Another object of the invention is to provide a video structuring deviceand method that enables efficient access to a video of interest byanalyzing the contents of a video and presenting resulting structureinformation as an index list of character string displays.

Another object of the invention is to provide a video structuring deviceand method that can present an index which is less affected bymisrecognitions included in the result of character recognition of acharacter string present in a video.

Another object of the invention is to provide a video structuring deviceand method that can display a character string display or a recognizedcharacter string that represents the contents of a video to a user as anindex for picture location.

Another object of the invention is to provide a video structuring deviceand method that can display a character string display or a recognizedcharacter string that represents the contents of a video to a user as anindex for picture location, and allows the user to enter information forselecting the character string display or the recognized characterstring to locate a specific picture and play back the video startingfrom a frame image identified by the selected character string displayor the recognized character string.

Another object of the invention is to provide a video structuring deviceand method that can preferentially display a recognized character stringto the user in accordance with a magnitude of the recognitionreliability upon character recognition of the character string in avideo, thereby allowing the user to utilize the display of a characterstring which represents the contents of the video more properly as anindex for picture location.

Another object of the invention is to provide a video structuring deviceand method that can preferentially display a character string display inthe form of an image to the user in accordance with smallness of therecognition reliability upon character recognition of the characterstring in a video, thereby allowing the user to utilize the display of acharacter string which represents the contents of the video moreproperly as an index for picture location.

Another object of the invention is to provide a video structuring deviceand method that can inform the user that a character string has appearedin a video such as when videos are sequentially supplied as input.

Another object of the invention is to provide a video structuring deviceand method that can inform the user that a predetermined characterstring has appeared in a video such as when videos are sequentiallysupplied as input.

Means for Solving the Problem

According to a first aspect of the invention, a video structuring deviceincludes: video input means for receiving a video signal and outputtinga frame image of a video and frame identifying information whichidentifies the frame image; character string extraction means forreceiving the frame image and the frame identifying information from thevideo input means to determine whether or not a character string ispresent in the frame image, and if it determines that a character stringis present in the frame image, generating character string positioninformation for the character string present in the frame image as acharacter string present frame image, and outputting the characterstring position information, frame identifying information foridentifying the character string present frame image and the characterstring present frame image; video information storage means forobtaining the frame identifying information, the character stringpresent frame image and the character string position information fromthe character string extraction means, and storing the obtained piecesof information associated with one another in an index file; andstructure information presentation means for reading out the index filefrom the video information storage means, cutting out an area in which acharacter string is present from the character string present frameimage based on the character string position information, and displayinga character string display in the form of the cut-out image on displaymeans being associated with frame identifying information foridentifying the character string present frame image. In this videostructuring device, the character string position information isconstituted from the coordinate values of a character string, forexample.

According to a second aspect of the invention, a video structuringdevice includes: video input means for receiving a video signal andoutputting a frame image of a video, frame identifying information foridentifying the frame image, and video data for the video signal;character string extraction means for receiving the frame image and theframe identifying information from the video input means to determinewhether or not a character string is present in the frame image, and ifit determines that a character string is present in the frame image,generating character string position information for the characterstring present in the frame image as a character string present frameimage, and outputting the character string position information, frameidentifying information for identifying the character string presentframe image and the character string present frame image; structureinformation presentation means; video information storage means forobtaining the frame identifying information, the character stringpresent frame image, and the character string position information fromthe character string extraction means to store them in an index fileassociated with one another, obtaining the video data and frameidentifying information from the video input means to store them beingassociated with one another, and when the video information storagemeans obtains the frame identifying information from the structureinformation presentation means, reading out video data which is recordedbeing associated with the frame identifying information obtained fromthe structure information presentation means, and outputting video datastarting from a frame image corresponding to the frame identifyinginformation obtained from the structure information presentation means;and video playback means for obtaining video data outputted by the videoinformation storage means and outputting the video data to display meansfor display. Here, the structure information presentation means readsout the index file from the video information storage means, cuts out anarea in which a character string is present from the character stringpresent frame image based on the character string position information,and outputs a character string display in the form of the cut-out imageto the display means for display. When the user enters information forselecting the character string display, the structure informationpresentation means outputs frame identifying information associated withthe selected character string display to the video information storagemeans.

According to a third aspect of the invention, a video structuring deviceincludes: video input means for receiving a video signal and outputtinga frame image of a video and frame identifying information whichidentifies the frame image; character string extraction means forreceiving the frame image and the frame identifying information from thevideo input means to determine whether or not a character string ispresent in the frame image, and if it determines that a character stringis present in the frame image, generating character string positioninformation for the character string present in the frame image as acharacter string present frame image, and outputting the characterstring position information, frame identifying information foridentifying the character string present frame image and the characterstring present frame image; character string recognition means forobtaining the frame identifying information, the character stringpresent frame image and the character string position information fromthe character string extraction means, cutting out an area in which acharacter string is present from the character string present frameimage based on the character string position information, applyingcharacter string recognition processing to the cut-out image to obtain arecognized character string in the form of character codes, andoutputting the recognized character string, the frame identifyinginformation, and the character string position information; videoinformation storage means for obtaining the frame identifyinginformation, the character string present frame image, and the characterstring position information from the character string extraction means,obtaining the recognized character string, the frame identifyinginformation and the character string position information from thecharacter string recognition means, and storing the obtained image andinformation in an index file being associated with one another; andstructure information presentation means capable of reading out theindex file from the video information storage means, cutting out an areain which a character string is present from the character string presentframe image based on the character string position information, anddisplaying a character string display in the form of the cut-out imageand the recognized character string on display means being associatedwith frame identifying information for identifying the character stringpresent frame image.

According to a fourth aspect of the invention, a video structuringdevice includes: video input means for receiving a video signal andoutputting a frame image of a video, frame identifying information foridentifying the frame image, and video data for the video signal;character string extraction means for receiving the frame image and theframe identifying information from the video input means to determinewhether or not a character string is present in the frame image, and ifit determines that a character string is present in the frame image,generating character string position information for the characterstring present in the frame image as a character string present frameimage, and outputting the character string position information, frameidentifying information for identifying the character string presentframe image, and the character string present frame image; characterstring recognition means for obtaining the frame identifyinginformation, the character string present frame image, and the characterstring position information from the character string extraction means,cutting out an area in which a character string is present from thecharacter string present frame image based on the character stringposition information, applying character string recognition processingto the cut-out image to obtain a recognized character string in the formof character codes, and outputting the recognized character string, theframe identifying information, and the character string positioninformation; structure information presentation means; video informationstorage means for obtaining the frame identifying information, thecharacter string present frame image and the character string positioninformation from the character string extraction means, obtaining therecognized character string, the frame identifying information, and thecharacter string position information from the character stringrecognition means, and storing the obtained image and information in anindex file being associated with one another, storing the video data andthe frame identifying information obtained from the video input meansbeing associated with one another, and when the video informationstorage means obtains the frame identifying information from thestructure information presentation means, reading out video data whichis recorded being associated with the frame identifying informationobtained from the structure information presentation means, andoutputting video data starting from a frame image corresponding to theframe identifying information obtained from the structure informationpresentation means; and video playback means for obtaining video dataoutputted by the video information storage means and outputting theobtained video data to display means for display. Here, the structureinformation presentation means can read out the index file from thevideo information storage means, cut out an area in which the characterstring is present from the character string present frame image based onthe character string position information, and output a character stringdisplay in the form of the cut-out image and the recognized characterstring to the display means for display. When the user entersinformation for selecting the displayed character string display orrecognized character string, the structure information presentationmeans outputs frame identifying information associated with the selectedcharacter string display or recognized character string to the videoinformation storage means.

In the present invention, the character string recognition means maycalculate the recognition reliability for a character string and thevideo information storage means. The reliability of recognition may be alikelihood value for character recognition on individual characters in acharacter string image, or the inverse of the average of a distancevalue, for example. When the recognition reliability is calculated, thevideo information storage means stores the recognition reliabilityobtained from the character string recognition means being associatedwith said character string position information in the index file, andthe structure information presentation means compares the recognitionreliability with a predetermined threshold value. If it determines thatthe recognition reliability of character string recognition is greaterthan the predetermined threshold value, the structure informationpresentation means may not display a character string display in theform of an image and may output a recognized character string to thedisplay means for display. Alternatively, if the structure informationpresentation means compares the recognition reliability with thepredetermined threshold value and determines that the reliability ofcharacter string recognition is smaller than the threshold value, it maynot display the recognized character string and may output a characterstring display in the form of an image to the display means for display.By selecting from the character string display or display of arecognized character string for preferential display in accordance withthe recognition reliability in this way, the user can use the characterstring display or the recognized character string whichever representsthe contents of a video more properly as an index for picture location.

Further, in the present invention, the structure informationpresentation means may have display means show information to the effectthat a character string is present in a video on to be shown and/or haveaudio output means emit sound when it determines that new characterstring position information is present. By adopting such a construction,the user can learn that a character string has appeared in a video suchas when videos are sequentially inputted and also utilize a characterstring display or a recognized character string that properly representsthe contents of the video as an index for picture location.

According to a fifth aspect of the invention, a video structuring deviceincludes: video input means for receiving a video signal and outputtinga frame image of a video; character string extraction means forreceiving the frame image from the video input means to determinewhether a character string is present in the frame image, and if itdetermines that a character string is present in the frame image,outputting information to the effect that a character string is present;and structure information presentation means for having display meansshow information to the effect that a character string is present in avideo on and/or having audio output means emit sound when the structureinformation presentation means obtains information to the effect thatthe character string is present from the character string extractionmeans.

According to a sixth aspect of the invention, a video structuring deviceincludes: video input means for receiving a video signal and outputtinga frame image of a video; character string extraction means forreceiving the frame image from the video input means to determinewhether or not a character string is present in the frame image, and ifit determines that a character string is present in the frame image,generating character string position information for the characterstring present in a character string present frame image in which thecharacter string is present, and outputting the character stringposition information; and structure information presentation means forhaving display means show information to the effect that a characterstring is present in the video on and/or having audio output meansoutput sound when the structure information presentation means obtainsthe character string position information from the character stringextraction means.

According to a seventh aspect of the invention, a video structuringdevice includes: video input means for receiving a video signal andoutputting a frame image of a video and frame identifying informationwhich identifies the frame image; character string extraction means forreceiving the frame image from the video input means to determinewhether or not a character string is present in the frame image, and ifit determines that a character string is present in the frame image,outputting a character string present frame image in which the characterstring is present and character string position information for thecharacter string present in the frame image; character stringrecognition means for obtaining the character string present frame imageand the character string position information from the character stringextraction means, cutting out an area in which a character string ispresent from the character string present frame image based on thecharacter string position information, applying character stringrecognition processing to the cut-out image to obtain a recognizedcharacter string in the form of character codes, and outputting therecognized character string and the character string positioninformation; and structure information presentation means for obtainingthe recognized character string from the character string recognitionmeans, determining whether or not the obtained recognized characterstring is a character string included in a group of predeterminedkeywords, and if it determines that the obtained recognized characterstring is a character string included in the predetermined keywords,having display means show information to the effect that a characterstring is present in the video and/or having audio output means emitsound. Adoption of such a configuration enables the user to learn that apredetermined character string has appeared in a video such as whenvideos are sequentially inputted.

According to the invention, since an index such as a character stringdisplay or a recognized character string properly representing thecontents of video content is presented being associated with video data(or a video source), the user can efficiently access a specific portionof a video he or she requires. For most video content, characterinformation appearing in a video is likely to properly reflect thecontents of the video and the user is enabled to efficiently access arequired portion of a video by associating an index which is generatedat the time of appearance of character information with video data. Evenwhen character information irrelevant to the contents of a video, suchas “breaking news”, is contained in a video, the user can promptlydecide whether or not to view a portion of the video corresponding tothe “breaking news” by seeing an index in the form of character stringdisplay.

According to the invention, even in a case where character informationappearing in a video is automatically recognized to obtain charactercodes and a resulting recognized character string is utilized as anindex, display can be switched between character string display in theform of an image and display of a recognized character string based onthe recognition reliability of the recognized character string.Consequently, a specific portion of a video can be accessed morereliably and a video can be searched with improved efficiency, which canreduce the user's burden of selecting operations.

Furthermore, according to the invention, the user can learn that acharacter string has appeared in a video even when videos aresequentially inputted. In addition, when being notified that a newcharacter string has appeared in a video, the user can enter informationfor selecting the display of that character string or a recognizedcharacter string to play back and view a video starting from a frameimage corresponding to the selected character string display orrecognized character string.

According to the invention, the user can utilize a character stringdisplay or a recognized character string that properly represents thecontents of a video as an index for picture location, and can find adesired picture location of the video by selecting such character stringdisplay or recognized character string properly representing thecontents of the video.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing an exemplary configuration of a videostructuring system that includes the video structuring device accordingto the invention;

FIG. 2 is a block diagram showing the video structuring device accordingto a first exemplary embodiment;

FIG. 3 is a view showing time-series frame images obtained by decoding avideo file having video identifying information “ABC.MPG”;

FIG. 4 is a view showing an example of index information outputted by acharacter string extraction unit based on the video file shown in FIG.3;

FIG. 5 is a view showing an example of the contents of a first indexfile that contains the index information shown in FIG. 4;

FIG. 6 a view showing an example of index list display;

FIG. 7 is a block diagram showing a signal processing system of thevideo structuring device according to a second exemplary embodiment;

FIG. 8 is a flowchart illustrating video structuring processing in thevideo structuring device shown in FIG. 7;

FIG. 9 is a flowchart showing an example of character string extractionprocessing;

FIG. 10 is a block diagram showing the video structuring deviceaccording to a third exemplary embodiment;

FIG. 11 is a block diagram showing the video structuring deviceaccording to a fourth exemplary embodiment;

FIG. 12 is a view showing an example of the contents of a second indexfile;

FIG. 13 is a view showing an example of index list display;

FIG. 14 is a block diagram showing the video structuring deviceaccording to a fifth exemplary embodiment;

FIG. 15 is a block diagram showing the video structuring deviceaccording to a sixth exemplary embodiment;

FIG. 16 is a block diagram showing the video structuring deviceaccording to a seventh exemplary embodiment;

FIG. 17 is a block diagram showing the video structuring deviceaccording to an eighth exemplary embodiment;

FIG. 18 is a block diagram showing the video structuring deviceaccording to a ninth exemplary embodiment;

FIG. 19 is a view showing another example of the index list display; and

FIG. 20 is a view showing another example of the index list display.

DESCRIPTION OF SYMBOLS

-   -   Video structuring system;    -   12, 14 Imaging device;    -   16 Video database;    -   18, 22 Antenna;    -   20 Video output device;    -   24 Base station;    -   30 Communication network;    -   100, 200, 300, 400, 500, 600, 700, 800, 900 Video structuring        device;    -   101, 102 Frame image;    -   103 Character string;    -   104, 105 Time of shooting;    -   106 Character string;    -   120 Title of index list display;    -   122 Video identifying information display field;    -   124 Frame identifying information;    -   126 Character string display;    -   128 Character string present frame image;    -   138, 139 Recognized character string;    -   170 Input device;    -   172 Display device;    -   210, 310, 410, 510, 610, 710, 810, 910 Video input unit;    -   212, 312, 412, 512, 612, 712, 812, 912 Character string        extraction unit;    -   216, 316, 416, 516, 816, 916 Video information storage unit;    -   218, 318, 418, 518, 618, 718, 818, 918 Structure information        presentation unit;    -   320, 520, 920 Video playback unit;    -   414, 514, 714, 814, 914 Character string recognition unit;    -   951 Image processing unit;    -   953 Compression/decompression unit;    -   955 Audio processing unit;    -   956 Audio output device;    -   957 Vocalization processing unit;    -   965, 968 Transmission/reception unit;    -   971 Input interface;    -   973 Display interface;    -   977 Recording medium;    -   978 Recording medium mounting unit;    -   979 Recording medium interface;    -   980 Information processing unit;    -   981 Memory;    -   984 Recording unit;    -   990 Calendar clock; and    -   999 Bus.

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 1 shows an exemplary configuration of a video structuring systemthat includes the video structuring device according to the presentinvention. The video structuring system includes imaging device 12forming an image of a subject on a light receiving surface andperforming photoelectric conversion of the image to output a videosignal for the image; video output device 20 for converting the videosignal for a taken image into video data for transmission and outputtingthe same to communication network 30, and video structuring device 100according to the invention. The video structuring device may also bevideo structuring devices 200, 300, 400, 500, 600, 700, 800 and 900according to the exemplary embodiments to be discussed below.

Video output device 20 is configured to be able to convert a videosignal for a taken image into video data for wireless transmission, andtransmit the video data to base station 24 and/or video structuringdevice 100 via antenna 18. Video output device 20 is also configured tobe able to convert a video signal for a taken image into video data forrecording and record the video data into video database 16. Video outputdevice 20 is further configured to be able to read out video datarecorded in video database 16, convert it into video data fortransmission, and output the data to communication network 30. The videodata may be composite video signals or the like. As communicationnetwork 30, a network for cable television may be utilized.

Video output device 20 also has the function of reading video datarecorded in video database 16, converting the data into video data forwireless transmission, and transmitting the video data to base station24 and/or video structuring device 100 via antennas 18, 22. Video outputdevice 20 also has the function of receiving by use of antenna 18 or thelike video data which is transmitted by base station 24 or videostructuring device 100 using a wireless or wired communication means,and recording the data into video database 16.

Base station 24 has the function of receiving by use of antenna 22 videodata outputted from antenna 18 of video output device 20, and convertingthe data into video data for wired transmission before outputting it tovideo structuring device 100 via communication network 30. Base station24 also has the function of receiving video data and/or variousinformation such as index information for video which is transmitted byvideo structuring device 100, and transmitting it to video output device20 and/or a communication device (not shown) such as a mobile phone anda mobile terminal, via antenna 22.

Video structuring device 100 has the function of receiving a videosignal outputted by imaging device 14 or video output device 20 via avideo input unit or a video signal input unit, which will be describedbelow, and extracting time-series frame images from the video signal,and generating index information that associates frame identifyinginformation that identifies a frame image containing a character stringportion, such as telops (subtitles), with character string positioninformation that identifies the position of the character string withrespect to the position or area of the character string portion withinthe frame image. The frame identifying information used herein includestime information, counter information, and page information, forexample. Then, video structuring device 100 outputs the generated indexinformation to another communication device via communication means,such as communication network 30 or wireless transmission. Imagingdevice 14 may also be capable of outputting audio signals by containinga microphone and the like.

Video structuring device 100 also has the function of recordinggenerated index information to a recording unit provided in videostructuring device 100, or a recording medium. Video structuring device100 further has the function of extracting an image of a characterstring portion contained in a frame image based on frame identifyinginformation and character string position information that identifiesthe position of the character string which are included in the generatedindex information, and generating display data for index list display.An image of a character string portion includes a character stringdisplay or a character string image. The display data is delivered fromvideo structuring device 100 to display device 172, thereby enabling anindex list display to be provided to the user.

On this video structuring system, when the user views index list displayincluding character string display or character string images andselects a character string display or character string image desired bythe user from input device 170, such as a keyboard and a mouse, an imagefile containing the frame image is retrieved based on frame identifyinginformation or the like which is associated with the character stringdisplay or the like. As a result, the video structuring system can startplayback from the position of the frame.

FIG. 2 shows a video structuring device according to a first exemplaryembodiment that has a similar configuration to the one described above.Video structuring device 200 shown in FIG. 2 includes: video unit 210which, with input of digitalized video data or a video signal, outputsframe images or time-series frame images, frame identifying informationfor identifying the individual frame images, and video identifyinginformation; character string extraction unit 212 to which frame imagesor time-series frame images are supplied from video input unit 210 andwhich determines whether or not any character string is present in theframe images, and when it determines that a character string is present,outputs frame identifying information for a character string presentframe image in which the character string is present and characterstring position information such as coordinate values of the characterstring within the frame image; video information storage unit 216 whichstores index information which associates the character string presentframe image, character string position information and frame identifyinginformation with each other as a first index file, and also stores videodata; and structure information presenting unit 218 which retrieves thestored first index file, and outputs to display device 172 a frame imagein which the character string is present or a character string imagecorresponding to character string position information. Here, the videosignal includes RGB signal, composite video signal, or like signals:

With this configuration, video input unit 210 has the function of, uponreceipt of digitalized video data, or a video signal such as RGB orcomposite video signal, outputting video identifying information thatidentifies the entire video, the digitalized video data, and frameidentifying information which identifies frame images during playback ofeach frame image of the video data, to video information storage unit216. Video input unit 210 also has the function of, when it receivessuch video data and/or video signal, generating frame images ortime-series frame images from the input video signal, and alsooutputting video identifying information for identifying the entirevideo as well as individual frame images or time-series frame images tocharacter string extraction unit 212 together with frame identifyinginformation which identifies the individual frame images separately.

To character string extraction unit 212, video identifying informationsuch as the name of a file in which a video is recorded or a programtitle, a frame image, and the second frame identifying information areentered from video input unit 210. Character string extraction unit 212determines whether any character string is present in the input frameimage. If it determines a character string is present in the input frameimage, character string extraction unit 212 outputs the videoidentifying information, a character string present frame image, frameidentifying information for identifying the specific frame image inwhich the character string is present, and character string positioninformation for the character string within the frame image to videoinformation storage unit 216 as index information. A character stringpresent frame image refers to a frame image which is detected ascontaining a character string; however, it may also be a thumbnail imagewhich is produced by reducing such a frame image in size as necessary.The character string position information may be coordinate values whichindicate where in a frame image a detected character string is present,for example. Structure information presentation unit 218 presentscharacter string display in the form of an image to the user based onindex information thus obtained.

In this exemplary embodiment, any frame identifying information is foridentifying individual frame images. As the frame identifyinginformation, information on such as the time of shooting, frame imagenumber, or counter information can be used. Time information forsynchronized reproduction such as PTS (Presentation Time Stamp) and DTS(Decoding Time Stamp), or reference time information SCR (System ClockReference) may be used as the time information.

Character string extraction unit 212 first receives video identifyinginformation, a first frame image, and frame identifying informationwhich identifies the individual frame images as input from video inputunit 210, and determines whether or not any character string is presentin the frame image. If it determines that a character string is presentin the frame image, character string extraction unit 212 then outputsvideo identifying information for that video, a character string presentframe image, frame identifying information for identifying the specificframe image in which the character string is present, and characterstring position information such as coordinate values of the characterstring within the frame image to video information storage unit 216 asfirst index information. Here, if the same character string is presentin a number of frame images, the specific frame image in which thecharacter string is present is preferably the first one of the frameimages which include the same character string. If no character stringis present in the frame images, character string extraction unit 212does not output frame identifying information and character stringposition information.

Then, character string extraction unit 212 determines whether or not acharacter string is present in the second frame image. If it determinesthat a character string is present in the frame image, character stringextraction unit 212 outputs frame identifying information foridentifying the character string present frame image in which thecharacter string is present and character string position informationsuch as coordinate values of the character string within the frameimage. Character string extraction unit 212 repeats this processing foreach subsequent frame image in sequence.

Here, exemplary processing for extracting a character string performedby character string extraction unit 212 will be described. Characterstring extraction unit 212 first differentiates an input frame image togenerate a differentiated image. Character string extraction unit 212then binarizes each pixel value of the differentiated image with apredetermined threshold value, and projects a resulting binarized imagein the horizontal and vertical directions to generate a histogram ofpixels, thereby obtaining a projection pattern.

Next, character string extraction unit 212 defines a continuous areahaving the value of the projection pattern equal to or greater than apredetermined value as a character string candidate area. Here, it mayomit any continuous area having a size smaller than a predeterminedvalue from character string area candidates as noise. Then, by applyinglayout analysis processing to each of character string candidate areasdetermined based on projection patterns, final character string positioninformation can be generated.

For layout analysis processing, a method like “Document layout analysisby extended split detection method” described in page 406 to 415 of theproceedings of “the IAPR Workshop on Document Analysis Systems” held in1998 can be employed, for example. This layout analysis processingextracts image areas other than characters and performs area divisionwith the position of the image areas as boundary to divide the areasinto sub-areas. By recursively applying this process to the sub-areas,position information for a character string can be finally obtained,e.g., as coordinate values within the image.

Although it is conceivable that noise remains in character stringcandidate areas by over-extracting much noise other than characters froma background image, the noise is removed in the course of recursiveprocessing as areas other than character strings by employing the layoutanalysis method described above. Consequently, the method described herecan extract only character strings. Character string positioninformation may be information representing the smallest rectangle thatsurrounds one character string or information representing a shape thatis combination of a number of rectangles.

FIG. 3 is a view showing time-series frame images that are obtained bydecoding a video file with video identifying information of “ABC.MPG”,for example, as well as character strings contained in the frame images.

Decoding of the video file “ABC.MPG” by video input unit 210 producesone or more frame images as shown in the figure. When a video signalsuch as RGB signal and YC signal (or composite signal) is supplied tovideo input unit 210, one or more frame images such as shown in FIG. 3can also be obtained by digitizing the time-series frame images.

Character string extraction unit 212 receives video identifyinginformation for the file “ABC.MPG”, individual frame images, and frameidentifying information which identifies these individual frame imagesfrom video input unit 210, and determines whether or not a characterstring is present in the frame images. Although the illustrated exampleuses a video file name as video identifying information, a program titlefrom an electronic program guide (EPG) and the like may also be used. Inthe illustrated example, shooting time information is used as frameidentifying information. In the following, processing performed in videostructuring device 200 shown in FIG. 2 will be described with referenceto a case where a series of frame images, such as shown in FIG. 3, isentered.

In this example, since character string 103 that reads “Character stringcontained in video” is present in frame image 101 of shooting time 104(1:23:14′33), character string extraction unit 212 outputs videoidentifying information “ABC.MPG” that identifies the entire video,video data for frame image 101 which is reduced in size as required,frame identifying information for identifying character string presentframe image 101 in which the character string is present, and characterstring position information which includes the coordinates Pa101 (120,400) and Pb101 (600, 450) of the character string within the frameimage, to video information storage unit 216 as index information. Asthe frame identifying information for identifying character stringpresent frame image 101, a file name “ABC-01231433.JPG” can be used, forexample.

The example shown in FIG. 3 uses a coordinate system with the upper leftpixel of a frame image set as the origin as the coordinate system forcharacter strings. Herein, the coordinate value of the upper left vertexof the smallest rectangle that surrounds a character string is definedas Pa and that of the lower right vertex of the rectangle as Pb.

Similarly, since there is character string 106 that reads “Characterstring” in frame image 102 of shooting time 105 (2:54:04′67), characterstring extraction unit 212 outputs video identifying information“ABC.MPG” which identifies the entire video, video data for frame image102 which is reduced in size as necessary, frame identifying informationfor identifying character string present frame image 102, and characterstring position information which includes the coordinates Pa102 (20,100) and Pb102 (120, 150) of the character string existing in the frameimage to video information storage unit 216 as index information. As theframe identifying information, a file name “ABC-02540467.JPG” can beused, for example.

FIG. 4 shows an example of index information which is outputted bycharacter string extraction unit 212 based on the video file shown inFIG. 3. As shown in FIG. 4, the index information outputted by characterstring extraction unit 212 includes video identifying information“ABC.MPG” which identifies the video file, frame identifying informationwhich identifies a frame image in which a character string is present,and character string position information for the character stringexisting in the frame image. The frame identifying information may be afile name “ABC-01231433.JPG” or the like, for example, and the characterstring position information may be coordinates Pa101 (120, 400) andPb101 (600, 450), or the like, for example.

Video information storage unit 216 stores, as a first index file, thefirst index information outputted by the character string extractionunit 212, which associates the video identifying information, thecharacter string present frame image in which the character string ispresent, the frame identifying information that identifies the characterstring present frame image, and the character string positioninformation with each other. Video information storage unit 216 alsostores the video identifying information, video data and frameidentifying information outputted by video input unit 210 as video data.

FIG. 5 is a view showing an example of the first index file containingthe index information shown in FIG. 4.

As shown, in the first index file (INDEX01.XML), index information forother video files (e.g., “DEF.MPG”) is also included in addition to oneor more pieces of index information for the video file “ABC.MPG” shownin FIG. 4. The first index file is not limited to a file having adatabase structure by such as XML (extensible markup language), but maybe a file of a file format for display such as HTML (hypertext markuplanguage) or other file formats.

Structure information presentation unit 218 retrieves an index filestored by video information storage unit 216, generates index listdisplay information, and outputs the information to display device 172.Display device 172 makes index list display as shown in FIG. 6 fornotification to the user. FIG. 6 shows an example of index list display.

As shown in FIG. 6, the index list display indicates title 120 of theindex list display, video identifying information display field 122 foridentifying the video file, frame identifying information 124 such asthe time of shooting for identifying a character string present frameimage in which a character string is present, and character stringdisplay 126 in the form of an image which is created by cutting out anarea in which the character string is present from a frame image byusing the frame identifying information, video data for the frame image,and character string position information. Character string display 126may be displayed in an order or at a position desired by the user. Anindex list may be displayed at time intervals desired by the user.

The user can select replay point information such as a desired characterstring display 126 and/or time of shooting by manipulating input device170 such as a mouse and a keyboard. The replay point information isinformation indicating from where a video should be played back, beingrepresented by frame identifying information. When the user selectsdesired character string display 126 or the like to designate a replaypoint of the video, the video file of the selected video identifyinginformation is retrieved and a video starting from the frame imageidentified by corresponding frame identifying information 124 will bedisplayed on display device 172. The example shown here employs the timeof shooting as replay point information.

FIG. 7 shows the configuration of a signal processing system of thevideo structuring device according to a second exemplary embodiment. Thevideo structuring device shown in FIG. 7 is realized by a programinstalled in a computer system controlling the hardware resources of thecomputer system. When the video structuring device receives a video asinput and determines that a character string is present in a frame imageof the input video, it can output, as index information, videoidentifying information for the video, a character string present frameimage which can be reduced in size as necessary such as a thumbnail,frame identifying information which identifies the specific characterstring present frame image in which the character string is present, andcharacter string position information such as coordinates values of thecharacter string present in the frame image.

Video structuring device 950 receives video signals from imaging device14 which forms a subject image on a light receiving surface andperforming photoelectric conversion of the image to output a videosignal for the image. Video structuring device 950 includes: imageprocessing unit 951 for converting an input video signal into video datafor recording, audio processing unit 955 to which audio signalscollected by imaging device 14 are entered and which converts them intoaudio or video data for recording, transmission/reception unit 965 forinputting and outputting video data, audio data, or other variousinformation from and to communication network 30, and antenna 20 andtransmission/reception unit 968 for transmitting and receiving videodata, audio data, or other various information to and from a radiocommunication network.

Video structuring device 950 also includes a compression/decompressionunit 953, recording medium mounting unit 978, recording medium interface979, input interface 971, display interface 973, information processingunit 980, memory 981, recording unit 984, and calendar clock 990.

Compression/decompression unit 953 performs compression control of avideo and decompression control of a compressed video by a methodrepresented by MPEG (motion picture expert group) for video or audiodata. compression/decompression unit 953 also performs processing ofcompression control of an image and decompression control of acompressed image by a method represented by JPEG (joint picture expertgroup) for video data.

To recording media mounting unit 978, recording medium 977 can beremovably mounted. Recording medium interface 979 is for recording andreading various information to and from recording medium 977. Recordingmedium 977 is a removable recording medium, such as a semiconductormedium like a memory card, an optical recording medium represented byDVD and CD, and a magnetic recording medium.

Input interface 971 transmits/receives information to and from inputdevice 170, which may be a keyboard, a mouse and the like used forentering various instructions such as to start or finish index listdisplay, select a video file, or select a character string display or acharacter string image. Display interface 973 outputs image signals fordisplay to display device 172 which displays information such as imagesand characters.

Information processing unit 980 may be composed of a CPU, for example,and it performs such processing as input of video signals, generation offrame images or frame identifying information from video signals,determination of whether there is a character string in a frame image,generation of character string position information, association ofvarious information, cutting out of an area in which a character stringis present from a frame image, and other overall control of videostructuring device 950. Memory 981 is used as a work area during programexecution. Recording unit 984 is formed of a hard disk and the like forrecording processing programs executed by the video structuring device950 and various constants, as well as various information such asaddresses for use in communication connection with communication deviceson a network, dial-up telephone numbers, attribute information, URL(Uniform Resource Locators), gateway information, and DNS (Domain NameSystem). The calendar clock is for timing.

In video structuring device 950, information processing unit 980 isconnected to peripheral circuits of the information processing unit bybus 999, which enables fast transfer of information among them.Information processing unit 980 can control the peripheral circuitsbased on instructions of processing programs running in informationprocessing unit 980.

Video structuring device 950 may also be a dedicated apparatus havingprocessing ability associated with structuring of video information.Alternatively, a generic processing device such as a video recorder, avideo camera, a digital still camera, a mobile phone equipped with acamera, a PHS (Personal Handyphone System), a PDA (Personal DataAssistance or Personal Digital Assistant: mobile information andcommunication devices for personal use), and a personal computer may beused as video structuring device 950.

Here, image processing unit 951, transmission/reception units 965, 968,recording medium interface 979, recording unit 984 and so forth can eachfunction as a video signal input unit, being capable of receivingdigitalized video data, or video signals such as RGB signal andcomposite video signal. By incorporating television tuner functions totransmission/reception unit 968, video signals can also be supplied tovideo structuring device 950 from an external device.

Display device 172, which is a liquid crystal display device, a CRT(cathode-ray tube) or the like, is used for displaying variousinformation such as character string images, recognized characterstrings, images, characters and index list display, for notification ofsuch information to the user. Audio output device 956 such as is aspeaker and the like, is used for conveying information indicating thepresence of a character string within a video by voice to the user basedon audio signals outputted by vocalization processing unit 957.

Information processing unit 980 has the functions of: generating, froman input video signal, frame images for the video and frame identifyinginformation identifying the frame images; determining whether or not acharacter string is present in a generated frame image, and if itdetermines that a character string is present in the frame image,generating character string position information such as coordinatevalues of the character string present in the character string presentframe image in which the character string is present; and generating acharacter string image by cutting out an area in which the characterstring is present from the character string present frame image based onthe character string position information.

Next, processing performed by the video structuring device shown in FIG.7 will be described using the flowchart of FIG. 8.

Processing being performed by information processing unit 980 of videostructuring device 950 proceeds to “video structuring processing” (boxS1200) when an instruction to start video structuring processing isentered by the user, or when a video signal is outputted from videooutput device 20, or when the time to start video structuring processingwhich is set in calendar clock 980 of video structuring device 950 hasbeen reached, or when start of video structuring processing is otherwiseinstructed. Then, information processing unit 980 performs the processof waiting for transmission of a video signal from video output device20 or imaging device 14.

At “video output processing” (box S1202), when video output device 20,imaging device 14 or the like outputs video signals with RGB, YC, MPEG,or other formats, image input unit 951, transmission/reception unit 965,968 and so forth of video structuring device 950 receives the videosignals at “video input processing” (box S1210), and outputs digitalizedtime-series video data to information processing unit 980,compression/decompression unit 953, memory 981 and so forth via bus 999.

When video signals such as RGB or YC signals are supplied from videooutput device 20, imaging device 14 or the like, RGB video signals, YCcomposite signals or like signals are supplied to image processing unit951. Image processing unit 951 outputs digitalized time-series videodata, along with frame identifying information which identifies frameimages during playback of each frame image of the video data, toinformation processing unit 980, compression/decompression unit 953,memory 981 and so forth via bus 999. Similarly, when video output device20 or imaging device 14 outputs audio signals, the audio signal issupplied to audio processing unit 955, which associates digitalizedaudio data with video data and outputs the data to informationprocessing unit 980, compression/decompression unit 953, memory 981 andso forth via bus 999.

Next, information processing unit 980 adds video identifying informationfor identifying the entire video to the time-series image data outputtedby image processing unit 951, and applies compression processing (orencoding processing) based on a standard such as MPEG to the time-seriesimage data at compression/decompression unit 953. In this state,information processing unit 980 manages the video identifyinginformation that identifies the entire video, digitalized time-seriesvideo data, and frame identifying information for identifying frameimages during playback of each frame image of the video data, which areassociated with each other. For the video identifying information foridentifying the entire video, the name of a file in which the video isrecorded or a program title can be used, for example.

On the other hand, when a video signal in MPEG or the like is enteredfrom video output device 20 or imaging device 14, image processing unit951 outputs the input video data to information processing unit 980,compression/decompression unit 953, memory 981 and so forth via bus 999.When video data encoded in MPEG or the like is entered from video outputdevice 20, transmission/reception unit 965 or transmission/receptionunit 968 outputs the input video data to information processing unit980, compression/decompression unit 953, memory 981 and so forth via bus999.

Then, information processing unit 980 transfers the obtained video datain MPEG or the like to compression/decompression unit 953 fordecompression processing (or decoding processing) to obtain time-seriesimage data. In this state, information processing unit 980 manages thevideo identifying information, time-series video data, and frameidentifying information for identifying frame images during playback ofeach frame image of the video data, which are associated with eachother. As in the above-described case, information on the time ofshooting, or information such as frame image number or counterinformation may be used as frame identifying information for identifyingindividual frame images. For time information, time information forsynchronized reproduction such as PTS (Presentation Time Stamp) and DTS(Decoding Time Stamp), or reference time information SCR (System ClockReference) can be used.

In the following “character string extraction processing” (box S1212),information processing unit 980 receives video identifying information,the first frame image, and frame identifying information for identifyingthe individual frame images from memory 981 or compression/decompressionunit 953 via bus 999, and determines whether or not a character stringis present in the frame image. If it determines that a character stringis present in the frame image, information processing unit 980 recordsthe video identifying information, a character string present frameimage, frame identifying information that identifies the specific frameimage in which the character string is present, and character stringposition information such as coordinate values of the character stringpresent in the frame image to memory 981 or recording unit 984 as firstindex information. Here, the character string present frame image can bereduced in size as necessary such as a thumbnail image. When the samecharacter string is present in a plurality of frame images, the specificframe image in which the character string is present is preferably thefirst one of such a plurality of the frame images. When it is determinedthat no character string is present in the frame image, frameidentifying information and character string position information arenot recorded.

Then, information processing unit 980 determines whether or not acharacter string is present in each of the second and subsequent frameimages in sequence. If it determines that a character string is presentin the current frame image, information processing unit 980 recordsframe identifying information for identifying the character stringpresent frame image in which the character string is present andcharacter string position information such as coordinate values of thecharacter string present in that frame image.

FIG. 9 shows a specific example of processing done in the characterstring extraction processing (box S1212).

When processing being executed by information processing unit 980proceeds to the “character string extraction processing” (box S1212)shown in FIG. 8, the series of processing shown in FIG. 9 is started.First, character string extraction processing starts at step S1260.Information processing unit 980 performs processing of receiving videoidentifying information, the n-th frame image (Fn), and frameidentifying information for identifying that frame image (Fn) andtemporarily storing them in memory 981 or recording unit 984 at stepS1262. Then, at step S1264, information processing unit 980 determineswhether or not there is any frame image from which character strings canbe extracted. If processing of extracting character strings has finishedfor all the image data and there is no more new frame image, characterstring extraction processing terminates at step S1266, and informationprocessing unit 980 returns to the processing routine shown in FIG. 8 toexecute the next process after the character string extractionprocessing. On the other hand, when information processing unit 980determines that there is a new frame image for character stringextraction, it calculates Fn/Fc in order to drop every Fc-th frame imageout of frame images from which character strings are to be extracted,and determines whether or not the calculation result is an integer atstep S1268. Here, Fc is a constant of a natural number. If it determinesthat the value of Fn/Fc is not an integer, information processing unit980 returns to step S1262 to receive the next frame image, i.e.,(Fn+1)-th frame image. On the other hand, if it determines at step S1268that the value of Fn/Fc is an integer, information processing unit 980executes differentiated image generation processing at step S1270. Inthe differentiated image generation processing, information processingunit 980 differentiates the frame image entered at step S1262 togenerate a differentiated image, and temporarily stores thedifferentiated image in memory 981 or recording unit 984.

Next, information processing unit 980 executes processing for binarizingthe differentiated image at step S1272. In the differentiated imagebinarizing processing, information processing unit 980 reads out thedifferentiated image generated at S1270 and a threshold value forbinarization from memory 981 or recording unit 984, binarizes each pixelvalue of the differentiated image using the threshold value, andtemporarily stores the binarized image data in memory 981 or recordingunit 984.

Next, information processing unit 980 executes projection patterngeneration processing at step S1274. In the projection patterngeneration processing, information processing unit 980 reads out thebinarized image data from memory 981 or recording unit 984, and projectsthe binarized image in the horizontal and vertical directions togenerate a histogram of pixels, thereby obtaining a projection pattern.Next, information processing unit 980 defines a continuous area havingthe value equal to or greater than a predetermined value in theprojection pattern as a character string candidate area. Here, it mayomit any continuous area having a size smaller than a predeterminedvalue from character string area candidates as noise. Then, by applyinglayout analysis processing to each of character string candidate areas,information processing unit 980 generates final character stringposition information.

As in the case described in the first exemplary embodiment, for layoutanalysis processing, a method like “Document layout analysis by extendedsplit detection method” described in page 406 to 415 of the proceedingsfor “the IAPR Workshop on Document analysis systems” held in 1998 can beemployed. This layout analysis processing extracts image areas otherthan characters and performs area division with the position of theimage areas as boundary to divide the areas into sub-areas. Byrecursively applying this process to the sub-areas, position informationfor a character string can be finally obtained, e.g., as coordinatevalues within the image. The position information for the characterstring may be coordinate values such as Pa101 and Pb11 shown in FIG. 3,for example.

Next, at step S1276, information processing unit 980 performs characterrecognition processing on the character string candidate area obtainedat step S1274. Subsequently, at step S1278, information processing unit980 determines whether or not a character string is present in thecharacter string candidate area based on the result of the characterrecognition processing. If it determines that no character string ispresent, information processing unit 980 returns to step S1262 toreceive the next frame image, i.e., (Fn+1)-th frame image. On the otherhand, if it determines that a character string is present, informationprocessing unit 980 determines at step S1280 whether or not thecharacter string recognized in the character string candidate area isthe same as the character string that existed in the last characterrecognition processing.

When it determines at step S1286 that the character string is notdifferent from the previous character string, that is, is the same asthe previous character string, information processing unit 980 returnsto step S1262 to receive the next frame image, i.e., (Fn+1)-th frameimage. Meanwhile, if it determines that the character string recognizedthis time is different from the previous character string, informationprocessing unit 980 performs index information recording processing atstep S1284. In the index information recording processing, informationprocessing unit 980 temporarily records the video identifyinginformation entered at step S1262, a frame image in which the characterstring is present, namely a character string present frame image, frameidentifying information which identifies the frame image in which thecharacter string is present, and character string position informationobtained at step S1274 in memory 981 or recording unit 984 as indexinformation which associates them with each other. Examples oftime-series frame images that are obtained by decoding the videoidentifying information “ABC.MPG”, character strings included in theframe images, frame identifying information for identifying the frameimages, and character string position information at this point areillustrated in FIG. 3. The index information for the video file shown inFIG. 3 is information of the format shown in FIG. 4, for example. Whenthe index information recording processing completes, informationprocessing unit 980 returns to step S1262, where it performs processingfor receiving the next frame image, i.e., (Fn+1)-th frame image.

In the character string extraction processing described above, acharacter string present frame image in which a character string ispresent may also be recorded being reduced in size as a thumbnail imageas necessary so that it requires less storage capacity and is easy todisplay at the time of index list display.

Referring back to FIG. 8, when the character string extractionprocessing (box S1212) completes, information processing unit 980executes “video information storing processing” (box S1216). In thevideo information storing processing, information processing unit 980retrieves the first index information temporarily stored in memory 981or recording unit 984 which associates video identifying information,the frame image in which the character string is present, frameidentifying information for identifying the frame image, and characterstring position information for the character string with one another,and stores it as a first index file. An example of the first index fileis shown in FIG. 5.

If video output device 20 and/or imaging device 14 supplies videosignals such as RGB or YC signals at the “video output processing” (boxS1202) described above, information processing unit 980 digitalizes thevideo signals, encodes it into a moving picture file in MPEG or the likeat compression/decompression unit 953, and records it in recording unit984 and/or recording medium 977. If video output device 20 and/orimaging device 14 supplies video signals encoded in MPEG or the like inthe “video output processing” (box S1202), information processing unit980 generates a moving picture file for recording from the videosignals, and records the file in recording unit 984 or recording medium977. These moving picture files are given unique video identifyinginformation for identification, and frame identifying information whichidentifies individual frame images when the files are decoded isrecorded therein. When storing processing of video informationcompletes, information processing unit 980 executes “structureinformation presentation processing” (box S1218).

In the structure information presentation processing, informationprocessing unit 980 retrieves the first index file recorded in recordingunit 984 or recording medium 977 and generates a display file for indexlist display such as the one shown in FIG. 6. Then, informationprocessing unit 980 reads out a frame image in which a character stringis present and which is described in the first index file from recordingunit 984 or recording medium 977 and expands it in memory 981. Then,information processing unit 980 attaches to the index list display acharacter string image which is generated by cutting out a characterstring candidate area in which a character string is present from theframe image based on the character string position information.Information processing unit 980 outputs display signals for the indexlist display thus generated to display device 172 via display interface973. An example of index list display is shown in FIG. 6. When thestructure information presentation processing completes, informationprocessing unit 980 executes the processing of determining whether aninstruction for termination has been entered as shown at step S1232.

At step S1232, information processing unit 980 determines whether or notthe user has entered an instruction to terminate the video structuringprocessing through input device 170. If the user has entered aninstruction for termination such as by selecting a button for exitingindex list display as shown in box S1230, information processing unit980 determines that an instruction for termination has been entered, andterminates the video structuring processing at step S1240. On the otherhand, if it determines that the user has not entered an instruction fortermination, information processing unit 980 returns to the video inputprocessing (box S1210). As a result, the video structuring processingcontinues to be executed.

If the user views the index list display shown in FIG. 6 and selectsdesired character string display 126 or a character string image and thelike by operating input device 170, such as a mouse or a keyboard, todesignate a replay point for the video, information processing unit 980retrieves the video file for the selected video identifying informationfrom recording unit 984 or the like, decodes the file, and outputs avideo starting from the frame image identified by corresponding frameidentifying information 124 to display device 172 for display. In theexample shown in FIG. 6, frame identifying information is represented bythe time of shooting.

Next, the video structuring device according to a third exemplaryembodiment will be described with reference to FIG. 10. In videostructuring device 300 shown in FIG. 10, video identifying informationsuch as the name of a file in which a video is recorded or a programtitle, frame images, and frame identifying information for identifyingthe individual frame images are supplied to character string extractionunit 312 from video input unit 310. Then, if character string extractionunit 312 determines that a character string is present in the inputframe images, it outputs the video identifying information, a characterstring present frame image, frame identifying information identifyingthe specific frame image in which the character string is present, andcharacter string position information such as coordinate values of thecharacter string present in the frame image to video information storageunit 316 as index information. The character string present frame imagecan be reduced in size as necessary, such as a thumbnail image.Structure information presentation unit 318 presents the image of thecharacter string to the user. If the user designates character stringdisplay 126 or the like which represents a replay point of the video,video playback unit 320 plays back the video starting from the replaypoint designated by the user.

As processing performed by video input unit 310 and that by characterstring extraction unit 312 of video structuring device 300 of the thirdexemplary embodiment are the same as processing performed by video inputunit 210 and that by character string extraction unit 212 of videostructuring device 200 shown in FIG. 2, detailed description of them isomitted here.

In this video structuring device 300, video information storage unit 316stores, as a first index file, first index information outputted bycharacter string extraction unit 312 which associates video identifyinginformation, a character string present frame image in which a characterstring is present, frame identifying information which identifies theframe image, and character string position information for the characterstring with each other. Here, video information storage unit 316 storesthe video identifying information, video data, and frame identifyinginformation outputted by video input unit 310 as video data.

Structure information presentation unit 318 retrieves an index filestored in video information storage unit 316 to generate index listdisplay information and outputs the index list display to display device172. Display device 172 makes index list display such as shown in FIG. 6for notification to the user.

When the user operates input device 170 such as a mouse and a keyboardto select replay point information such as desired character stringdisplay 126 or the time of shooting, structure information presentationunit 318 selects video identifying information and frame identifyinginformation corresponding to the replay start point, and outputs them tovideo information storage unit 316. Upon receipt of the videoidentifying information and frame identifying information from structureinformation presentation unit 318, video information storage unit 316reads out video data corresponding to the obtained video information,and outputs it to video playback unit 320 together with the frameidentifying information. When video playback unit 320 is configured tobe able to decode a video file to obtain time-series frame images, videoinformation storage unit 316 outputs a video file and frame identifyinginformation to video playback unit 320. Video playback unit 320 decodesthe obtained video file and displays frame images starting from theframe identifying information, thereby presenting a video from thereplay point to the user. When video playback unit 320 is configured toobtain and display time-series frame images, video information storageunit 316 outputs time-series frame images starting from the frameidentifying information to video playback unit 320. In the latter case,video playback unit 320 displays frame images starting from the frameidentifying information, thereby presenting the video starting from thereplay point to the user.

Since video structuring device 300 shown in FIG. 10 uses a portion of acharacter string present frame image for character string display 126which is in the form of an image serving as an index, it has lesspossibility of a phenomenon in which character string display 126 doesnot agree with the contents of a video than when only character stringsresulting from character recognition are displayed. Accordingly, theuser can view the index list display which shows character stringdisplay 126 to see the contents of the video and easily locate aspecific picture.

FIG. 11 shows a video structuring device according to a fourth exemplaryembodiment. In this video structuring device 400, video identifyinginformation such as the name of a file in which a video is recorded or aprogram title, frame images, and frame identifying information foridentifying the individual frame images are supplied to character stringextraction unit 412 from video input unit 410. If character stringextraction unit 412 determines that a character string is present in theinput frame images, it outputs the video identifying information, acharacter string present frame image, frame identifying information foridentifying the specific frame image in which the character string ispresent, and character string position information such as coordinatevalues of the character string present in the frame image to videoinformation storage unit 416 as index information. Character stringextraction unit 412 also outputs the character string present frameimage, frame identifying information, and character string positioninformation to character string recognition unit 414. The characterstring present frame image can be reduced in size as necessary, such asa thumbnail image.

Character string recognition unit 414 cuts out an area defined by thecharacter string position information from the character string presentframe image as image data, and extracts a character string contained inthe cut-out image data as a recognized character string, namelycharacter codes, and outputs the recognized character string to videoinformation storage unit 416. Structure information presentation unit418 presents the image of the character string or the recognizedcharacter string to the user.

As processing performed by video input unit 410 and processing up to theoutput of index information by character string extraction unit 412 tovideo information storage unit 416 in video structuring device 400 ofthe fourth exemplary embodiment are the same as processing performed byvideo input unit 210 and that by character string extraction unit 212 ofvideo structuring device 200 shown in FIG. 2 respectively, detaileddescription of them is omitted here.

If character string extraction unit 412 determines that a characterstring is present in a frame image, it outputs first index informationto video information storage unit 416 and also outputs a characterstring present frame image, frame identifying information, and characterstring position information to character string recognition unit 414.However, if it determines that no character string is present in theframe image, character string extraction unit 412 does not output acharacter string present frame image, frame identifying information, andcharacter string position information to character string recognitionunit 414.

Character string recognition unit 414 extracts a character string as arecognized character string (or character codes) from a character stringpresent frame image by using image data for the character string presentin the area defined by the character string position information anddictionary data for character string recognition. The character stringrecognition processing performed here can utilize the character clippingmethod and apparatus therefor that is described in JP-A-3-141484 or thefast recognition and retrieval system and a recognition and retrievalacceleration method used therefor which are described inJP-A-2001-034709, for example. In the character string recognitionprocessing, the recognition reliability of a result of character stringrecognition may be calculated. The reliability of character stringrecognition may be a likelihood value for character recognition onindividual characters in a character string image, or the inverse of theaverage of the distance value, for example.

When character string recognition processing completes, character stringrecognition unit 414 then outputs the resulting recognized characterstring, frame identifying information for a frame image in which thecharacter string is present, character string position information, andthe recognition reliability of the character string resulting from thecharacter string recognition to video information storage unit 416.Video information storage unit 416 stores, as a second index file,second index information that associates the video identifyinginformation, character string present frame image in which the characterstring is present, frame identifying information for identifying theframe image, character string position information for the characterstring, recognized character string, and the recognition reliabilitywith one another, which were outputted by character string extractionunit 412 and character string recognition unit 414. Video informationstorage unit 416 also stores the video identifying information, videodata, and frame identifying information outputted by video input unit410 as video data.

FIG. 12 shows an example of the second index file. In the second indexfile (INDEX02.XML), in addition to the information described in thefirst index file shown in FIG. 5, recognized character strings and therecognition reliability for the character strings are stored beingassociated with frame identifying information. Here, information on thetime of shooting is used as frame identifying information.

Structure information presentation unit 418 retrieves the second indexfile stored by video information storage unit 416 and generates indexlist display information, which is outputted to display device 172.Display device 172 makes index list display such as shown in FIG. 13 fornotification to the user. FIG. 13 shows an example of index listdisplay.

As shown in FIG. 13, the index list display indicates title 120 of theindex list display, video identifying information display field 122 foridentifying video files, frame identifying information 124 such as thetime of shooting for identifying a frame image in which a characterstring is present, character string display 126 which is an imagegenerated by cutting out an area in which a character string is presentfrom a frame image by using video data and character string positioninformation for the frame image, and recognized character string 138.

The user can select replay point information such as desired characterstring display 126, recognized character string 138, time of shooting,and the like by operating input device 170, such as a mouse and akeyboard. When the user designates a replay point of a video byselecting a desired character string display 126 and the like, the videofile for the selected video identifying information may be retrieved anda video starting from the frame image identified by corresponding frameidentifying information 124 may be displayed on display device 172. Theexample shown here employs the time of shooting as replay pointinformation.

Recognized character string 138 may be always displayed; however, it isalso possible not to display recognized character string 138 when itsreliability of recognition is at or below a predetermined thresholdvalue Θ1, e.g., when the recognition reliability is at or belowthreshold value Θ1=50%. It is also possible to display only recognizedcharacter string 138 and not character string display 126 which is inthe form of an image when the recognition reliability is at or above apredetermined threshold value Θ2, e.g., when the recognition reliabilityis at or above threshold value Θ2=90%.

Since this exemplary embodiment uses a portion of a character stringpresent frame image for character string display 126 which is in theform of an image serving as an index, it has less possibility of aphenomenon in which character string display 126 does not agree with thecontents of a video than when only character strings resulting fromcharacter recognition are displayed. Accordingly, the user can view theindex list display to see the contents of the video and easily locate aspecific picture. In addition, since this exemplary embodiment enablescontrol of display method between character string display in the formof an image and display of recognized character strings as a function ofthe reliability of a character string recognition result, the user canselect an index with confidence in recognized character strings andsearch a video with improved efficiency.

FIG. 14 shows a video structuring device according to a fifth exemplaryembodiment. In this video structuring device 500, video identifyinginformation such as the name of a file in which a video is recorded or aprogram title, frame images, and frame identifying information foridentifying the individual frame images are entered to character stringextraction unit 512 from video input unit 510. Then, if character stringextraction unit 512 determines that a character string is present in theinput frame images, it outputs the video identifying information, acharacter string present frame image, frame identifying information, andcharacter string position information such as coordinate values of thecharacter string present in the frame image to video information storageunit 516 as index information. Character string extraction unit 512 alsooutputs the character string present frame image, frame identifyinginformation, and character string position information to characterstring recognition unit 514. Character string recognition unit 514extracts a character string as a recognized character string (orcharacter codes) from image data for the character strings present in anarea within the character string present frame image which is defined bythe character string position information, and outputs the recognizedcharacter string, frame identifying information, character stringposition information, and the reliability of recognition to videoinformation storage unit 516.

Structure information presentation unit 518 presents the image of acharacter string or a recognized character string to the user. When theuser selects replay point information such as desired character stringdisplay 126, recognized character string 138, the time of shooting orthe like, structure information presentation unit 518 retrieves a videofile identified by video identifying information based on the user'sselection from video information storage unit 516, and displays a videostarting from the frame image identified by corresponding frameidentifying information 124 on display device 172.

As processing performed by video input unit 510, character stringextraction unit 512 and character string recognition unit 514,processing by video information storage unit 516 for storinginformation, and a portion of processing up to the presentation ofstructure information by structure information presentation unit 518 ofvideo structuring device 500 of the fifth exemplary embodiment are thesame as those performed by video input unit 410, character stringextraction unit 412, character string recognition unit 514, videoinformation storage unit 416 and structure information presentation unit418 of video structuring device 400 shown in FIG. 11, detaileddescription on them is omitted here.

Video information storage unit 516 stores, as a second index file,second index information that associates the video identifyinginformation, character string present frame image, frame identifyinginformation for identifying the frame image, character string positioninformation for the character string, recognized character string, andthe recognition reliability with one another, which were outputted bycharacter string extraction unit 512 and character string recognitionunit 514. Video information storage unit 516 also stores the videoidentifying information, video data, and frame identifying informationoutputted by video input unit 510 as video data.

Structure information presentation unit 518 retrieves the second indexfile stored by video information storage unit 516, generates index listdisplay information, and outputs the index list display to displaydevice 172. Display device 172 makes index list display such as shown inFIG. 13 for notification to the user.

The user can designate a replay start point of a video by operatinginput device 170 such as a mouse and a keyboard to select replay pointinformation such as desired character string display 126, recognizedcharacter string 138, time of shooting and the like. When the userdesignates a replay start point of a video, structure informationpresentation unit 518 selects video identifying information and frameidentifying information corresponding to the replay start point, andoutputs them to video information storage unit 516. Upon receipt of thevideo identifying information and frame identifying information fromstructure information presentation unit 518, video information storageunit 516 reads out video data corresponding to the obtained videoinformation, and outputs it to video playback unit 520 together with theframe identifying information. When video playback unit 520 isconfigured to be able to decode a video file to obtain time-series frameimages, video information storage unit 516 outputs a video file andframe identifying information to video playback unit 520. In this case,video playback unit 520 decodes the obtained video file and displaysframe images starting from the frame identifying information, therebypresenting a video from the replay point to the user. When videoplayback unit 520 is configured to obtain and display time-series frameimages, video information storage unit 516 outputs time-series frameimages starting from the frame identifying information to video playbackunit 520. In the latter case, video playback unit 520 displays frameimages starting from the frame identifying information, therebypresenting the video starting from the replay point to the user.

Since this exemplary embodiment uses a portion of a character stringpresent frame image for character string display 126 which is in theform of an image serving as an index, it has less possibility of aphenomenon in which character string display 126 does not agree with thecontents of a video than when only character strings resulting fromcharacter recognition are displayed. The user can view the index listdisplay to see the contents of the video and easily locate a specificpicture. In addition, since this exemplary embodiment enables control ofdisplay method between character string display in the form of an imageand display of a recognized character string as a function of thereliability of a character string recognition result, the user canselect an index with confidence in recognized character strings andsearch a video with improved efficiency.

FIG. 15 shows a video structuring device according to a sixth exemplaryembodiment. In this video structuring device 600, when a frame image issupplied from video input unit 610, character string extraction unit 612determines whether any character string is present in the input frameimage. If it determines that a character string is present, characterstring extraction unit 612 outputs the fact a character string ispresent, a character string present frame image, and character stringposition information such as coordinate values of the character stringpresent in the frame image to structure information presentation unit618. Then, structure information presentation unit 618 promptly displaysa frame image or a character string image corresponding to the characterstring position information, or displays information to the effect thata character string is present in the frame image for notification to theuser.

Video input unit 610 is configured to be able to receive digitalizedvideo data or video signals such as RGB signals and composite videosignals as input and output video data for display to structureinformation presentation unit 618. Video input unit 610 also receivesdigitalized video data or video signals such as RGB signals andcomposite video signals as input, and generates frame images from theinput video signals for output to character string extraction unit 612.

Upon input of a frame image from video input unit 610, character stringextraction unit 612 determines whether or not any character string ispresent in the frame image. Then, if it determines that a characterstring is present in the frame image, character string extraction unit612 outputs the fact that a character string is present, a characterstring present frame image, and character string position informationsuch as coordinate values of the character string present in the frameimage to structure information presentation unit 618.

Structure information presentation unit 618 usually generates a videofor display based on video data supplied from video input unit 610, andoutputs the video to display device 172 for presentation to the user.Upon receipt of the fact that a character string is present in a frameimage, character string present frame image, and character stringposition information such as coordinate values of the character stringpresent in the frame image from character string extraction unit 612,structure information presentation unit 618 displays informationindicating the presence of a character string in a frame image fornotification to the user. Presence of a character string in a frameimage may be notified by audibly providing information on appearance ofthe character string, or a new character string display is provided inindex list display such as shown in FIG. 6 to update the index listdisplay. Structure information presentation unit 618 may also turn onthe power switch of display device 172 when it is determined that acharacter string is present in a frame image to draw the user'sattention. Structure information presentation unit 618 may also send anelectronic mail message notifying the presence of a character string toa predetermined mail address when it is determined that a characterstring is present in a frame image.

FIG. 16 shows a video structuring device according to a seventhexemplary embodiment. In video structuring device 700 shown in FIG. 16,character string extraction unit 712 receives frame images and frameidentifying information for identifying the individual frame images fromvideo input unit 710. If it determines that a character string ispresent in the input frame images, character string extraction unit 712outputs the character string present frame image, frame identifyinginformation, and character string position information such ascoordinate values of the character string present in the frame image tostructure information presentation unit 718 as third index information.Character string extraction unit 712 also outputs the character stringpresent frame image, frame identifying information, and character stringposition information to character string recognition unit 714. Characterstring recognition unit 714 extracts the character string as arecognized character string (or character codes) from image data for thecharacter string present in an area within the character string presentframe image which is defined by the character string positioninformation, and outputs the recognized character string, frameidentifying information, character string position information, and thereliability of recognition to structure information presentation unit718.

In video structuring device 700 of the seventh exemplary embodiment,video input unit 710 is capable of, with input of digitalized video dataor video signals such as RGB signals and composite video signals,outputting digitalized video data and frame identifying informationwhich identifies frame images during playback of each frame image of thevideo data to structure information presentation unit 718. Video inputunit 710 receives as input such digitalized video data or video signals,generates frame images or time-series frame images from the input videosignals, and outputs the frame images and frame identifying informationto character string extraction unit 712.

Character string extraction unit 712 first receives a first frame imagefrom video input unit 710, and determines whether or not any characterstring is present in the frame image. If it determines a characterstring is present in the frame image, character string extraction unit712 then outputs the video identifying information, a character stringpresent frame image, frame identifying information for identifying thespecific frame image in which the character string is present, andcharacter string position information such as the coordinate values ofthe character string present in the frame image to structure informationpresentation unit 718 as third index information. At the same time,character string extraction unit 712 outputs the character stringpresent frame image, frame identifying information, and character stringposition information to character string recognition unit 714. Here, thecharacter string present frame image can be reduced in size asnecessary, such as a thumbnail image. If the same character string ispresent in a plurality of frame images, the specific frame image inwhich the character string is present is preferably the first one ofsuch a plurality of the frame images. If no character string is presentin the frame image, character string extraction unit 712 does not outputa character string present frame image, frame identifying informationand character string position information.

Character string extraction unit 212 then determines whether or not acharacter string is present in the second frame image. If it determinesthat a character string is present in the frame image, character stringextraction unit 212 outputs the character string present frame image inwhich the character string is present, frame identifying information foridentifying the character string present frame image, and characterstring position information such as coordinate values of the characterstring present in the frame image. Character string extraction unit 212repeats this processing on subsequent frame images in sequence.

Character string recognition unit 714 uses dictionary data for characterstring recognition to extract a character string as a recognizedcharacter string (or character codes) contained in the image data forthe character string present in an area within the character stringpresent frame image which is defined by character string positioninformation.

The seventh exemplary embodiment can also utilize the character clippingmethod and apparatus therefor that is described in JP-A-3-141484 or thefast recognition and retrieval system and a recognition and retrievalacceleration method used therefor which are described inJP-A-2001-034709 for character string recognition processing as in theexemplary embodiments described above. The recognition reliability of aresult of character string recognition may also be calculated in thischaracter string recognition processing. The reliability of characterstring recognition may be a likelihood value for character recognitionon individual characters in a character string image, or the inverse ofthe average of the distance value, for example. When character stringrecognition completes, character string recognition unit 714 outputs theresulting recognized character string, character string positioninformation, frame identifying information for the frame image in whichthe character string is present, and the recognition reliability of thecharacter string resulting from the character string recognition tostructure information presentation unit 718.

Structure information presentation unit 718 usually generates a videofor display based on video data supplied from video input unit 710, andoutputs the video to display device 172 for presentation to the user.Upon receipt of third index information and the like that includes thefact that a character string is present in a frame image, a characterstring present frame image, and character string position informationsuch as coordinate values of the character string present in the frameimage, frame identifying information and the like from character stringextraction unit 712 and character string recognition unit 714, structureinformation presentation unit 718 displays information to the effectthat a character string is present in the frame image for notificationto the user. Structure information presentation unit 718 also providesnew character string display 126 or recognized character string 138 inindex list display shown in FIG. 13 to update the index list display.

Recognized character string 138 may be always displayed; however, it isalso possible not to display recognized character string 138 when itsreliability of recognition is at or below a predetermined thresholdvalue Θ1, e.g., when the recognition reliability is at or belowthreshold value Θ1=50%. It is also possible to display only recognizedcharacter string 138 and not character string display 126 which is inthe form of an image when the recognition reliability is at or above apredetermined threshold value Θ2, e.g., when the recognition reliabilityis at or above threshold value Θ2=90%.

In this exemplary embodiment, presence of a character string in a frameimage may be notified by audibly providing information on appearance ofthe character string. Structure information presentation unit 718 mayalso turn on the power switch of display device 172 when it isdetermined that a character string is present in a frame image to drawthe user's attention.

As information to be notified to the user, the user may be notified of aspecific character string predefined by the user. In this case, acharacter string which the user wants to use for notification isregistered to the recording unit or the like in advance. Upon receivinginformation to the effect that a character string is present in a frameimage from character string extraction unit 712, structure informationpresentation unit 718 retrieves the character string registered in therecording unit or the like therefrom and displays the character stringon display device 172. Furthermore, the form or contents of anotification to the user that a character string is present in a frameimage may be changed in accordance with the reliability of recognition.

As one form of information notification to the user, the user may benotified of the presence of a character string when a predeterminedspecific character string is present in a video. In this case, uponobtaining a recognized character string from character stringrecognition unit 712, structure information presentation unit 718determines whether or not the character string is a character stringincluded in a group of predetermined keywords. If it determines that therecognized character string is a character string included in thepredetermined keywords, structure information presentation unit 718displays information to the effect that the character string is presentin the video on display device 172 or outputs sound from an audio outputdevice so as to notify the user that the predetermined character stringhas appeared.

Structure information presentation unit 718 may also send an electronicmail message notifying the presence of a character string to apredetermined mail address when it is determined that a character stringis present in a frame image. A recognized character string which hasbeen recognized and outputted by character string recognition unit 714may be embedded in the e-mail message to notify the user of therecognized character string itself. In this case, embedding of therecognized character string may be executed in accordance with thereliability of recognition upon recognizing the character string. Forexample, the recognized character string may be embedded in an e-mailmessage only when the reliability of recognition is at or above 50%.

FIG. 17 shows a video structuring device according to an eighthexemplary embodiment. Video structuring device 800 has the functions ofvideo structuring device 400 shown in FIG. 11 as well as those of videostructuring device 700 shown in FIG. 16. Here, structure informationpresentation unit 818 is configured to be able to make index listdisplay and notify the presence of a character string to the user.

Video input unit 810 of video structuring device 800 has the functionsof video input unit 410 of video structuring device 400 shown in FIG. 11and those of video input unit 710 of video structuring device 700 shownin FIG. 16. Character string extraction unit 812 of video structuringdevice 800 has the functions of character string extraction unit 412shown in FIG. 11 and those of character string extraction unit 712 shownin FIG. 16. Character string recognition unit 814 has the functions ofcharacter string recognition unit 414 shown in FIG. 11 and those ofcharacter string recognition unit 714. Video information storage unit816 of video structuring device 800 has the functions of videoinformation storage unit 716 shown in FIG. 16, and structure informationpresentation unit 818 has the functions of structure informationpresentation unit 418 shown in FIG. 11 and those of structureinformation presentation unit 718 shown in FIG. 16.

Structure information presentation unit 818 makes index list displaysuch as shown in FIG. 13 on display device 172 for notification to theuser. Upon receipt of information to the effect that a character stringis present in a frame image from character string extraction unit 812,structure information presentation unit 818 displays information to theeffect that a character string is present in a frame image fornotification to the user and also shows new character string display 126or recognized character string 138 in index list display to update theindex list display.

Recognized character string 138 may be always displayed; however, it isalso possible not to display recognized character string 138 when itsreliability of recognition is at or below a predetermined thresholdvalue Θ1, e.g., when the recognition reliability is at or belowthreshold value Θ1=50%. It is also possible to display only recognizedcharacter string 138 and not character string display 126 which is inthe form of an image when the recognition reliability is at or above apredetermined threshold value Θ2, e.g., when the recognition reliabilityis at or above threshold value Θ2=90%.

In this exemplary embodiment, presence of a character string in a frameimage may also be notified by audibly providing information onappearance of the character string. Structure information presentationunit 718 may also turn on the power switch of display device 172 when itis determined that a character string is present in a frame image todraw the user's attention.

As information to be notified to the user, the user may be notified of apredefined specific character string. In this case, a character stringdesired to be used for notification is registered to the recording unitor the like in advance. When structure information presentation unit 818receives information to the effect that a character string is present ina frame image from character string extraction unit 812, structureinformation presentation unit 818 retrieves the registered characterstring from the recording unit or the like and displays the characterstring on display device 172. Furthermore, the form or contents of anotification to the user that a character string is present in a frameimage may be changed in accordance with the reliability of recognition.

Structure information presentation unit 818 may also send an electronicmail message notifying the presence of a character string to apredetermined mail address when it is determined that a character stringis present in a frame image. A recognized character string which hasbeen recognized and outputted by character string recognition unit 814may be embedded in the e-mail message. In this case, embedding of therecognized character string may be executed in accordance with thereliability of recognition upon recognizing the character string. Forexample, the recognized character string may be embedded in an e-mailmessage only when its reliability of recognition is at or above 50%.

FIG. 18 shows a video structuring device according to a ninth exemplaryembodiment. Video structuring device 900 has the functions of videostructuring device 500 shown in FIG. 14 as well as those of videostructuring device 700 shown in FIG. 16. Here, video playback unit 920is configured to be able to display a video starting from a replay pointselected by the user on display device 172.

Video input unit 910 of video structuring device 900 has the functionsof video input unit 510 of video structuring device 500 shown in FIG. 14and those of video input unit 710 of video structuring device 700 shownin FIG. 16. Character string extraction unit 912 of video structuringdevice 900 has the functions of character string extraction unit 512shown in FIG. 14 and those of character string extraction unit 712 shownin FIG. 16. Character string recognition unit 914 has the functions ofcharacter string recognition unit 514 shown in FIG. 14 and those ofcharacter string recognition unit 714 shown in FIG. 16. Videoinformation storage unit 916 of video structuring device 900 has thefunctions of video information storage unit 716 shown in FIG. 16, andstructure information presentation unit 918 has the functions ofstructure information presentation unit 518 shown in FIG. 14 and thoseof structure information presentation unit 718 shown in FIG. 16.

Structure information presentation unit 918 makes index list displaysuch as shown in FIG. 13 on display device 172 for notification to theuser. Upon receipt of information to the effect that a character stringis present in a frame image from character string extraction unit 912,structure information presentation unit 918 displays information to theeffect that a character string is present in a frame image fornotification to the user and also shows new character string display 126or recognized character string 138 in index list display to update theindex list display.

The presence of a character string in a frame image may be notified byaudibly providing information on appearance of the character string.Structure information presentation unit 718 may also turn on the powerswitch of display device 172 when it is determined that a characterstring is present in a frame image to draw the user's attention.

As information to be notified to the user, the user may be notified of apredefined specific character string. In this case, a character stringdesired to be used for notification is registered to the recording unitor the like in advance. Upon reception of information to the effect thata character string is present in frame images from character stringextraction unit 912, structure information presentation unit 918retrieves the registered character string from the recording unit andthe like and displays the character string on display device 172.Furthermore, the form or contents of a notification to the user that acharacter string is present in a frame image may be changed inaccordance with the reliability of recognition.

Structure information presentation unit 818 may also send an electronicmail message notifying the presence of a character string to apredetermined mail address when it is determined that a character stringis present in a frame image. A recognized character string which hasbeeb recognized and outputted by character string recognition unit 814may be embedded in the e-mail message. In this case, embedding of therecognized character string may be executed in accordance with thereliability of recognition upon recognizing the character string.

In this exemplary embodiment, the user can view the index list displayshown on display device 172 and designate a replay start point of avideo by manipulating input device 170 such as a mouse and a keyboard toselect replay point information such as desired character string display126, recognized character string 138, time of shooting and the like.When the user designates a replay start point of a video by operatinginput device 170, structure information presentation unit 918 selectsvideo identifying information and frame identifying informationcorresponding to the replay start point, and outputs them to videoinformation storage unit 916. Upon receipt of the video identifyinginformation and frame identifying information from structure informationpresentation unit 918, video information storage unit 916 reads outvideo data corresponding to the obtained video information, and outputsit to video playback unit 920 together with the frame identifyinginformation. When video playback unit 920 is configured to be able todecode a video file to obtain time-series frame images, videoinformation storage unit 916 outputs a video file and frame identifyinginformation to video playback unit 920. In this case, video playbackunit 920 decodes the obtained video file and displays frame imagesstarting from the frame identifying information, thereby presenting avideo from the replay point to the user. When video playback unit 920 isconfigured to obtain and display time-series frame images, videoinformation storage unit 916 outputs time-series frame images startingfrom the frame identifying information to video playback unit 920. Inthe latter case, video playback unit 920 displays frame images startingfrom the frame identifying information, thereby presenting the videostarting from the replay point to the user.

Also in this exemplary embodiment, as one form of informationnotification to the user, the user may be notified of the presence of acharacter string when a predetermined specific character string ispresent in a video. In this case, upon obtaining a recognized characterstring from character string recognition unit 912, structure informationpresentation unit 918 determines whether or not the character string isa character string included in a group of predetermined keywords. If itdetermines that the recognized character string is a character stringincluded in the predetermined keywords, structure informationpresentation unit 918 displays information to the effect that thecharacter string is present in the video on display device 172 oroutputs sound from an audio output device so as to notify the user thatthe predetermined character string has appeared.

Since this exemplary embodiment uses a portion of a character stringpresent frame image for character string display 126 which is in theform of an image serving as an index, it has less possibility of aphenomenon in which character string display 126 does not agree with thecontents of a video than when only character strings resulting fromcharacter recognition are displayed. The user can view the index listdisplay to see the contents of the video and easily locate a specificpicture. In addition, since this exemplary embodiment enables control ofdisplay method as a function of the reliability of a character stringrecognition result, the user can select an index with confidence inrecognized character strings and search a video with improvedefficiency.

The present invention notifies a user of the presence of a characterstring when videos are sequentially supplied and a character string or adesired character string has appeared in the video. Accordingly, byusing the present invention, when it is necessary to monitor theappearance of a specific character string in a video, the user can bepromptly notified of the presence of a character string of interest.

While examples of index list display in the present invention are shownin FIGS. 6 and 13, index list display is not limited to these forms.

FIG. 19 shows another example of index list display. In the index listdisplays shown in FIGS. 6 and 13, an area in which a character string ispresent is cut out from a character string present frame image based oncharacter string position information and character string display inthe form of a cut-out image is shown on a display device beingassociated with frame identifying information; whereas in the index listdisplay shown in FIG. 19, character string present frame image 128 of areduced size is indicated in the index list display.

FIG. 20 shows yet another example of index list display. While the indexlist display shown in FIG. 13 displays character string display 126 inthe form of an image and recognized character string 138 concurrently,the one shown in FIG. 20 switches between character string display 126in the form of an image and display in the form of recognized characterstring 139 depending on the reliability of recognition.

Here, switching between the character string display in the form of animage and display in the form of a recognized character string will bedescribed. The description here illustrates a case where the displaymethod is switched in accordance with reliability of recognition will bedescribed. By way of example, consider a case where threshold value Θ1for determining whether or not to display a recognized character stringis set to 50%, threshold value Θ3 for determining whether or not tohighlight a recognized character string is to 80%, and threshold valueΘ2 for determining whether or not to display a character string in theform of an image is to 90%.

When such threshold values are set, if the reliability of recognizing acharacter string “Character string contained in video” is calculated tobe 40%, the value of recognition reliability is smaller than Θ1 (=50%),so that only character string display 126 in the form of an image isdisplayed and a recognized character string is not displayed for“Character string contained in video” as shown in FIG. 20. If therecognition reliability for a character string “Character string” iscalculated to be 95%, the value of the recognition reliability isgreater than Θ2 (90%) and greater than Θ3 (80%), so that only recognizedcharacter string 139 is highlighted and character string display in theform of an image is not displayed for “Character string” as shown inFIG. 20. The highlighting may be display in boldface type or may use aconspicuous color or pattern.

In such a manner, since the display method can be controlled betweencharacter string display in the form of an image and display ofrecognized character strings as a function of the reliability of acharacter string recognition result, the user can select an index withconfidence in recognized character strings and search a video withimproved efficiency.

The video structuring devices of the first and third to ninth exemplaryembodiments described above can also be realized by installing programsfor executing the processes described above in a computer system likethe video structuring device of the second exemplary embodiment.Accordingly, the computer programs for realizing the video structuringdevices of the first to ninth exemplary embodiments are also encompassedwithin the scope of the invention.

INDUSTRIAL APPLICABILITY

The present invention facilitates video search and picture location bythe user by providing index list display for video search based on thepresence of character strings. This invention can be applied to suchsystems as video recorders, video cameras, and digital still cameras.The invention is also applicable to mobile terminal devices with imagetaking and receiving capabilities, such as mobile phones, PHS (PersonalHandyphone System), personal computers, PDA (Personal DigitalAssistants; mobile information communication devices for personal use)all equipped with a camera, and other systems.

1. A video structuring device, comprising: video input means forreceiving a video signal and outputting a frame image of a video andframe identifying information which identifies the frame image;character string extraction means for receiving said frame image andsaid frame identifying information from said video input means todetermine whether or not a character string is present in the frameimage, and if it determines that a character string is present in theframe image, generating character string position information for thecharacter string present in the frame image as a character stringpresent frame image, and outputting the character string positioninformation, frame identifying information for identifying saidcharacter string present frame image and said character string presentframe image; video information storage means for obtaining said frameidentifying information, said character string present frame image andsaid character string position information from said character stringextraction means, and storing the obtained pieces of informationassociated with one another in an index file; and structure informationpresentation means for reading out said index file from said videoinformation storage means, cutting out an area in which a characterstring is present from said character string present frame image basedon said character string position information, and displaying acharacter string display in a form of said cut-out image on displaymeans being associated with frame identifying information foridentifying said character string present frame image.